Fix SDPA TT_METAL_WATCHER issues by pavlejosipovic · Pull Request #37928 · tenstorrent/tt-metal

pavlejosipovic · 2026-02-16T10:39:29Z

Summary

generate_reduce_scaler hardcoded 2048 bytes and 4 faces, assuming full 32x32 bf16 tiles. When circular buffers use half tiles (1024B, 2 faces), this overwrites adjacent L1 memory causing watcher-detected corruption.
Restore the half_tile template parameter so the zero-fill size and face iteration adapt to the actual tile dimensions. Also fix idle core runtime args count mismatch in sdpa_decode_program_factory.
Remove watcher skips in test_sdpa_prefill.py (now passing with the fix)
Restore watcher skip in test_flash_mla.py for Blackhole (separate issue)

Replaces #37833 (closed due to bad rebase)

Test plan

🤖 Generated with Claude Code

`generate_reduce_scaler` hardcoded 2048 bytes and 4 faces, assuming full 32x32 bf16 tiles. When circular buffers use half tiles (1024B, 2 faces), this overwrites adjacent L1 memory causing watcher-detected corruption. Restore the `half_tile` template parameter (previously removed in cleanup) so the zero-fill size and face iteration adapt to the actual tile dimensions. Also fix idle core runtime args count mismatch in sdpa_decode_program_factory. Fixes: #37631 Fixes: #29225 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…pa_prefill.py

The watcher skip for issue #37631 was prematurely removed. Restore it until the underlying issue is fully resolved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pavlejosipovic · 2026-02-16T10:42:20Z

/codeowners ping

tenstorrent-github-bot · 2026-02-16T10:42:59Z

CodeOwners Group Analysis

This PR requires approval from one member of each of the following groups:

Summary: 2 pending groups, 0 approved groups

Group Information:

⏳ tenstorrent/metalium-developers-ttnn-core (Team) - Members: Pavlo Hilei, Brian Liu, Joseph Chu, Artem Yerofieiev, Diego Gomez | Pending approval
📁 Files owned by this team (1 files)
- ttnn/cpp/ttnn/kernel/dataflow/generate_reduce_scaler.hpp

⏳ tenstorrent/metallium-maintainers-llama-models (Team) - Members: Raymond Kim, Colman Glagovich, Evan Smal, Jonathan Su, Harry Andrews | Pending approval
📁 Files owned by this team (3 files)

Note: At least one approval from each group is sufficient.

tenstorrent-github-bot · 2026-02-16T10:43:06Z

Hi Evan Smal (@esmalTT), Raymond Kim (@tt-rkim), this PR Fix SDPA TT_METAL_WATCHER issues by Pavle Josipović (@pavlejosipovic) needs your approval/review to merge this.

Copilot

Pull request overview

This PR fixes TT_METAL_WATCHER-detected corruption/issues in SDPA decode by ensuring the reduce-scaler generation logic matches the actual circular buffer tile size (full vs half tiles) and by aligning idle-core runtime argument counts with what the decode reader kernel expects. It also updates SDPA prefill unit tests to remove watcher skips now that the underlying issue is addressed.

Changes:

Restore half-tile awareness for generate_reduce_scaler and pass the correct half/full-tile mode from the SDPA decode writer kernel.
Fix idle-core reader runtime-arg vector length in SdpaDecodeProgramFactory to match the reader kernel’s expected arg reads.
Remove TT_METAL_WATCHER skip decorators from test_sdpa_prefill.py (per PR description: now passing with the fix).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`ttnn/cpp/ttnn/operations/transformer/sdpa_decode/device/sdpa_decode_program_factory.cpp`	Fix idle-core reader runtime-arg count to prevent watcher OOB runtime-arg access.
`ttnn/cpp/ttnn/operations/transformer/sdpa_decode/device/kernels/dataflow/writer_decode_all.cpp`	Detect half-tile scalar CBs and invoke `generate_reduce_scaler` with the correct template mode.
`ttnn/cpp/ttnn/kernel/dataflow/generate_reduce_scaler.hpp`	Reintroduce `half_tile` template parameter to size the zero-fill and face-looping correctly.
`tests/ttnn/unit_tests/operations/sdpa/test_sdpa_prefill.py`	Remove watcher-enabled skips now that corruption/OOM issues should be resolved by the kernel fix.

Pavle Josipovic and others added 3 commits February 16, 2026 10:37

Remove watcher skips in tests/ttnn/unit_tests/operations/sdpa/test_sd…

0ade3dc

…pa_prefill.py

Revert test_flash_mla.py: restore watcher skip for Blackhole

cd160f8

The watcher skip for issue #37631 was prematurely removed. Restore it until the underlying issue is fully resolved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 16, 2026 10:39

pavlejosipovic requested review from a team as code owners February 16, 2026 10:39

Copilot started reviewing on behalf of pavlejosipovic February 16, 2026 10:40 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SDPA TT_METAL_WATCHER issues#37928

Fix SDPA TT_METAL_WATCHER issues#37928
pavlejosipovic wants to merge 3 commits intomainfrom
pjosipovic/sdpa_watcher_fixes

pavlejosipovic commented Feb 16, 2026 •

edited

Loading

Uh oh!

pavlejosipovic commented Feb 16, 2026

Uh oh!

tenstorrent-github-bot commented Feb 16, 2026

Uh oh!

tenstorrent-github-bot commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pavlejosipovic commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pavlejosipovic commented Feb 16, 2026

Uh oh!

tenstorrent-github-bot commented Feb 16, 2026

CodeOwners Group Analysis

Group Information:

Uh oh!

tenstorrent-github-bot commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pavlejosipovic commented Feb 16, 2026 •

edited

Loading